Fungal communities represent the majority of root-specific transcripts in the transcriptomes of Agave plants grown in semiarid regions

Agave plants present drought resistance mechanisms, commercial applications, and potential for bioenergy production. Currently, Agave species are used to produce alcoholic beverages and sisal fibers in semi-arid regions, mainly in Mexico and Brazil. Because of their high productivities, low lignin content, and high shoot-to-root ratio, agaves show potential as biomass feedstock to bioenergy production in marginal areas. Plants host many microorganisms and understanding their metabolism can inform biotechnological purposes. Here, we identify and characterize fungal transcripts found in three fiber-producing agave cultivars (Agave fourcroydes, A. sisalana, and hybrid 11648). We used leaf, stem, and root samples collected from the agave germplasm bank located in the state of Paraiba, in the Brazilian semiarid region, which has faced irregular precipitation periods. We used data from a de novo assembled transcriptome assembly (all tissues together). Regardless of the cultivar, around 10% of the transcripts mapped to fungi. Surprisingly, most root-specific transcripts were fungal (58%); of these around 64% were identified as Ascomycota and 28% as Basidiomycota in the three communities. Transcripts that code for heat shock proteins (HSPs) and enzymes involved in transport across the membrane in Ascomycota and Basidiomycota, abounded in libraries generated from the three cultivars. Indeed, among the most expressed transcripts, many were annotated as HSPs, which appear involved in abiotic stress resistance. Most HSPs expressed by Ascomycota are small HSPs, highly related to dealing with temperature stresses. Also, some KEGG pathways suggest interaction with the roots, related to transport to outside the cell, such as exosome (present in the three Ascomycota communities) and membrane trafficking, which were further investigated. We also found chitinases among secreted CAZymes, that can be related to pathogen control. We anticipate that our results can provide a starting point to the study of the potential uses of agaves’ fungi as biotechnological tools.


INTRODUCTION
Plants host many microorganisms that have crucial roles in plant development, growth, adaptability, and diversity (Trivedi et al., 2020). Plant microbiomes are composed of endophytes-which colonize the tissues and occupy the intra and intercellular spaces in at least one period of their life cycle-and epiphytes-which colonize the vegetal surface (Fitzpatrick et al., 2020). Understanding these microbial communities' structures and their interaction mechanisms with plants can lead to phenotypes of interest, such as drought resistance and plant growth promoters, assisting in higher productivities (Wani et al., 2015).
Agaves are drought-resistant semiarid plants with commercial uses and potential to be used as feedstock for bioenergy production in marginal areas. Currently, agaves are only used to produce sisal fibers (mainly in Brazil) and alcoholic beverages, such as tequila and mezcal (in Mexico). However, agaves also show promise as feedstock for bioenergy production in marginal areas because of high productivities (Owen, Fahy & Griffiths, 2016), low lignin content (Yang et al., 2015), and high shoot-to-root ratio (Borland et al., 2009). Agaves also have several drought resistance mechanisms, such as the crassulacean acid metabolism (CAM), the most water-use efficient photosynthesis (Borland et al., 2009). Additionally, they possess retractable roots (Blunden, Yi & Jewers, 1973), waxy epidermis, and sunken stomata (Davis & Long, 2015) to avoid water loss.
Agave's morphological and physiological adaptations to dry climates have been more studied than other aspects, such as molecular mechanisms and genetics. Some recent research has focused on the molecular aspects of agave's CAM metabolism and drought resistance mechanisms, mainly with the species A. tequilana Weber var. azul and A. americana (Gross et al., 2013;Yang et al., 2015;Abraham et al., 2016). Considering fiber-producing agave cultivars, such as A. fourcroydes, A. sisalana, and hybrid 11648 ((A. angustifolia x A. amaniensis) x A. amaniensis), there are just a few recent studies available (Huang et al., 2018;Huang et al., 2019;Sarwar et al., 2019;Raya et al., 2021), mostly approaching plant physiological mechanisms related to drought or cell wall biosynthesis. However, one aspect still to be vastly explored is the microbiome.
In agave plants, tolerance to heat and drought stress partly might be due to the microorganisms that inhabit them. Indeed, a study found the mycobiota from A. tequilana and A. salmiana produced volatile organic compounds that improved plant growth (Camarena-Pozos et al., 2021). Thus, knowing these microorganisms and their possible impacts on plants becomes relevant. In agave, classical protocols for isolation and/or inoculation of growth promoting microorganisms were performed (Pimienta- Barrios, Zanudo-Hernandez & Lopez-Alcocer, 2009;Ruiz et al., 2013;Martínez-Rodríguez et al., 2014;Quinones-Aguilar et al., 2016;Montoya Martinez et al., 2019), and more recently metagenomics and metatranscriptomics approaches were applied (Coleman-Derr et al., 2016;Citlali et al., 2018;Flores-Núñez et al., 2020). In Brazil, three studies explored the microbial community of Agave sisalana but used only classical microbiology methods to isolate endophytic fungi and prokaryotes (Candeias et al. 2016;Damasceno et al., 2019).
Metatranscriptomics can identify genes expressed in microbiomes and, therefore, possible plant-microbe interactions. However, there are some challenges in metatranscriptomics such as dependence on the availability of information in databases for taxonomic characterization (Shakya, Lo & Chain, 2019) and complications in RNA extraction depending on the type of sample (e.g., soil) (Mukherjee & Reddy, 2020). Furthermore, most meta-omics tools are better applied for prokaryotic organisms (Shakya, Lo & Chain, 2019).
In our previous work (Raya et al., 2021), we assembled and analyzed the comprehensive transcriptomes (leaf, stem, and root tissues) of three fiber-producing agave cultivars collected at noon in a germplasm bank located in Monteiro, Paraíba, Brazil. This region faced an irregular rainfall regime before sampling. Two of the cultivars are the most used in fiber production in Brazil-A. sisalana and hybrid 11648-and one is the most used in Mexico-A. fourcroydes.
In this paper, we used bioinformatic analyses to identify and characterize fungal communities found on three agave transcriptomes. We (1) assessed the number of fungal transcripts, their location in the plant tissues, and their expression values; (2) assessed their taxonomic affiliations; (3) performed functional annotation, focusing on transporters and carbohydrate-degrading enzymes (CAZymes), and (4) investigated potential heat shock proteins.

Transcriptome sequencing, assembly, and quantification
Samples of A. fourcroydes, A. sisalana, and hybrid 11648 were collected at the city of Monteiro, state of Paraíba, Brazil, at the agave germplasm bank owned by Embrapa (Brazilian Agricultural Research Corporation). The germplasm bank is at a semi-arid climate area with non-calcic brown soil. Before sampling, the precipitation regime was highly irregular (Fig. S1); sampling occurred on July 6th, 2016, and the last precipitation happened on May 31st, 2016. We harvested, at noon, three different individuals (apparently healthy), for each cultivar. Plants were around 7 years old and we collected root, stem, and leaf samples. Root samples were collected from 0-10 cm depths. Samples were not disinfected, so both epiphytic and endophytic microorganisms might be present. Total RNA was obtained with the protocol proposed by Zeng & Yang (2002) with modifications by Le Provost et al. (2003). Libraries were prepared with 1 µg total RNA using the KAPA Stranded mRNA-Seq kit, following the manufacturer's protocol, isolating only poly(A) tails. Sequencing was performed using Illumina/HiSeq 4000, generating nine libraries for each cultivar with millions of 50bp paired-end reads. Raw data is available in NCBI SRA (accession number PRJNA746623).
Reads were assembled into three separate transcriptomes for each agave cultivar using Trinity v. 2.5.1 (Grabherr et al., 2011). We used Transdecoder v. 5.0.2 (Haas et al., 2013) for ORF prediction, configured to a minimum length of 200 nucleotides. We performed an additional step to select the longest isoform for each locus, considering only those with TPM (Transcript per Million) values greater than 1 and ORF length greater than 255 nucleotides. Expression values were obtained with kallisto v. 0.44.0 (Bray et al., 2016) with 100 bootstraps, returning TPM values. For all further analysis, the mean expression value from the three replicates was used. For each transcriptome, we performed tissue specificity analysis using the tspex program (Camargo et al., 2020) considering a threshold of SPM > 0.95 to identify tissue-specific transcripts. Annotation of the whole transcriptome was performed with PANNZER2 (Törönen, Medlar & Holm, 2018) and remaining unannotated sequences were submitted to a BLASTp against Uniref90 (E-value 1e−5). During the annotation step we identified that along with plant transcripts, many fungal transcripts were assembled, which led us to develop a new pipeline for their correct identification and further analysis.

In-house pipeline to separate plant and fungal transcripts
We used two approaches to extract fungal transcripts from the three plant transcriptomes. Kaiju software v. 1.6.3 (Menzel, Ng & Krogh, 2016) was used to obtain the fungal sequences and to infer the fungal taxonomic classification by comparison with the NCBI/NR database with an E-value threshold of 1e−5. From this classification, count matrices were generated in the format of reads per taxon, which were compared at the phylum and genus levels. Sequences annotated as plants (higher level: Streptophyta division) were excluded. However, manual annotation revealed that many plant transcripts with very high expression were mistakenly classified as fungal sequences, so we decided to use another approach as well.
The second approach was based on the similarities and differences between the plant and fungal protein sequences available in the Uniref90 database. For this, we performed a BLASTx (E-value threshold of 1e−10) of the assembled transcriptome of A. fourcroydes, A. sisalana, and hybrid 11648 against Uniref90 and selected all transcripts that presented at least 70% of the top 10 hits identified as fungi in Taxonomy DB. Only the results from the second approach were considered for subsequent analyses.

Taxonomic classification of the fungal transcripts
Taxonomic classification from the software Kaiju was used as a guide to perform a pipeline to separate the main fungal phyla (Ascomycota and Basidiomycota) in our datasets. We developed a Perl pipeline (Data S1) to test different parameters to optimize this classification. From 1,014 fungal genomes available on Ensembl Fungi release 46 (https://fungi.ensembl.org), we randomly selected 2,000 CDS for Ascomycota and 700 for Basidiomycota to be used as a training and test dataset. These proportions were based on the preliminary analysis with Kaiju software which showed that the fungal transcripts are mostly represented by Ascomycota and Basidiomycota species. Because there were many parameters to consider, being a multivariate set, we had to test the sensitivity of each parameter after fixating one of them.
Thus, we used similarities in nucleotides (BLASTn) and amino acids (tBLASTx) to evaluate the classification capacity (using metrics TPR -true positive rate -and FPR -false positive rate) between the groups ''Ascomycota'', ''Basidiomycota'', ''Asco or Basidio'' or ''Other Fungi''. These 2,700 CDSs were blasted against all CDS from 1,014 fungal genomes (after the exclusion of these 2,700 CDS to avoid self-identification). The parameters of E-value threshold (1e−10 and 1e−20) and the minimum number N of top BLAST hits (top N BLAST hits) were optimized. If a transcript could not be distinguished between Ascomycota and Basidiomycota based on our criteria, it would be classified as ''Asco or Basidio''. Moreover, if the transcript did not present similarity with Ascomycota nor Basidiomycota, but presented similarity with other fungal genomes, it was assigned to ''Other Fungi''.

Orthologous gene analysis
To compare the three fungal communities' nucleotide sequences, an orthologous analysis was done with software OrthoFinder v. 2.5.2 (Emms & Kelly, 2019) configured with the parameter ''-d'' (nucleotide similarity). Then, we counted the number of orthologous nucleotide families exclusive to (orphan genes) or shared between Ascomycota and Basidiomycota.

Ribosomal RNA identification
Ribosomal RNA identification was performed using the full set of raw sequence reads. We used SortMeRNA (Kopylova, Noé & Touzet, 2012) with default parameters to filter the reads that had similarities to the 28S rRNA database. Then we filtered the reads belonging to fungi to assess percentages of fungal reads in our datasets. Finally, we listed the top 5 fungal species that had more abundant reads in each RNA-seq library.

Functional annotation of fungal transcripts and enrichment tests
Fungal transcripts from the three plants were further annotated with RPS-Blast (E-value threshold of 0.01) using the CDD database (Lu et al., 2020) as reference. The EggNOGmapper program was used (Huerta-Cepas et al., 2019) to identify KEGG pathway (KO) groups (Kanehisa et al., 2019). Using an R script (Data S1), each fungal transcript was grouped considering the annotation of CDD and KO and divided between ''Ascomycota'', ''Basidiomycota'', ''Asco or Basidio'', and ''Other Fungi'' groups. Significant KEGG terms and CDDs were detected by the hypergeometric test using all fungal transcripts as background, being accepted those with p-value < 0.05.

Transport proteins, CAZymes, and secreted proteins prediction
Membrane transport proteins were identified using BLASTp against the curated Transporter Classification Database (TCDB) (Saier et al., 2016) with a threshold E-value of 1e−5 and an alignment coverage of at least 70%. To identify carbohydrate-active enzymes (CAZymes), we used the dbCAN software (Yin et al., 2012). We also used SignalP v. 5.0 (Almagro Armenteros et al., 2019) to identify sequences of signal peptides in all fungal transcripts, with a threshold of score >0.5 for secreted proteins.

Heat shock protein orthologous analysis
We compared heat shock proteins (HSPs) between our six fungal datasets (Ascomycota and Basidiomycota sequences from A. fourcroydes, A. sisalana, and hybrid 11648) and all Ascomycota and Basidiomycota available genomes in Ensembl Fungi release 46. To do so, we searched for HSP domains using the manually curated database HSPIR (Ratheesh Table 1 Summary of assembly parameters. Plant transcriptome assemblies were performed with Trinity and ORF prediction was performed using Transdecoder (minimum length of 200 nucleotides). All transcripts were submitted to a BLASTx against Uniref90 for fungal transcript assignment. Transcripts with at least 70% of the top 10 hits identified as fungi in the Taxonomy DB were considered fungal ones.  Kumar et al., 2012), which has sequences from the six major groups of HSPs. To obtain all heat shock proteins in our fungal datasets and the fungal genomes (1,014 genomes), we searched with HMMER (Eddy, 2008) with an E-value threshold of 0.001. Then, we selected the top 10 genomes for Ascomycota and the top 10 for Basidiomycota that had more members of HSP families (see the distribution in Fig. S2). Using only the HSPs sequences from these 20 datasets and our fungal community datasets divided into Ascomycota and Basidiomycota for each cultivar, we ran a protein orthologous analysis using OrthoFinder (Emms & Kelly, 2019).

Assignment of transcripts to the genus Talaromyces
We did a BLASTn (E-value < 1e−10) of all Ascomycota transcripts against a set of concatenated genomes of Talaromyces available at Ensembl Fungi release 46. We selected only blast hits with alignment coverage above 70%.  (Table S1). Fungal transcript and protein sequences for each cultivar are available in Data S2. Fungal transcripts were largely root-specific (i.e., specificity measure, SPM > 0.95). Root-specific fungal transcripts were more abundant than root-specific plant transcripts in the three agave transcriptomes (Fig. 1A) (around 58% of all root-specific transcripts belong to fungi). Nevertheless, there are six, three and nine expressed transcripts in other tissues (for A. fourcroydes, A. sisalana, and hybrid 11648, respectively). These transcripts present high expression values in two or three tissues (Table S1). Among them, there are some annotated as heat shock proteins. Indeed, the most expressed transcript in hybrid  Nucleotide orthologous analysis between the three communities showed they share a very similar core of expressed transcripts, only presenting a few exclusive gene families ( Fig. 1B; Table S2). As expected, due to the greater amount of Ascomycota transcripts, we found a higher proportion of Ascomycota-exclusive families (square brackets) compared to Basidiomycota (curled brackets); however, the opposite occurs in hybrid 11648-exclusive families (five and 18 for Ascomycota and Basidiomycota, respectively). Furthermore, we found that more than half of all transcripts are orphans (1,484, 2,702, and 1,619 for A. fourcroydes, A. sisalana, and hybrid 11648, respectively), i.e., don't form orthologous clusters with any other gene.

Taxonomic analysis
Taxonomy inference of the whole transcriptomes using the Kaiju software classified 12% ± 1.5 (values are the mean for the three plant communities) as fungal transcripts, of which 68% ± 5.5 were Ascomycota and 30% ± 5.3 Basidiomycota. The most represented Ascomycota genera were Talaromyces (9% ± 1.5) and Corynespora (4% ± 0.2). For Basidiomycota, there was resolution only to classify until order, of which the main were Agaricales (6% ± 2.5) and Auriculariales (3% ± 1.4). However, this approach misclassified many plant transcripts as fungal ones, so we used a Perl pipeline to classify Table 2 Transcript numbers in each fungal group. Transcripts were identified with our in-house pipeline described in the Methods section. ''Asco or Basidio'' refers to transcripts that were either Ascomycota or Basidiomycota but could not be classified. transcripts between Ascomycota and Basidiomycota, as these were the most abundant phyla (information obtained with Kaiju). This new approach was based on sequence similarity using 1,014 public fungal genomes and their respective coding sequences (more details in the Methods section). Results of the parameter optimization step are on Data S3. After parameter optimization, we applied the pipeline to classify the fungal transcripts of the three cultivars into the ''Ascomycota'', ''Basidiomycota'', ''Asco or Basidio'', and ''Other Fungi'' groups (Table 2; Table S1).

Ribosomal RNA analysis
To confirm the previous results, we returned to the raw reads and searched the rRNA fungal dataset with SortMeRNA. The rRNA analysis confirmed that root samples presented much higher percentages of fungal rRNA (between 6% and 20%) than leaf and stem (below 1%) ( Table S3). The most abundant groups in the roots were Parastagonospora, Talaromyces, and Cryptococcus, while in the other tissues the most abundant were Cryptococcus and Verrucaria. We could also confirm that Ascomycota and Basidiomycota are the most represented groups and that the Talaromyces genus is in the top five abundant groups.

Functional annotation of the fungal transcripts
Functional annotation analysis is shown in Fig. 2 as frequencies of statistically significant (p-value < 0.05) enriched KEGG pathways (Figs. 2A and 2C) and conserved domains (CDD) (Figs. 2B and 2D). Generally, Ascomycota presented more categories than Basidiomycota, as Ascomycota has more transcripts.
The most frequent enriched KEGG pathway in both Ascomycota and Basidiomycota was related to chaperones and folding catalysts, absent only in the A. fourcroydes Basidiomycota community. Some pathways suggest interaction with the roots, related to transport to outside the cell, such as exosome (present in the three Ascomycota communities, but exclusive in Basidiomycota in A. sisalana), membrane trafficking, and transporters. Other routes could be associated with root development and elongation, such as Citrate Cycle (TCA) and Glyoxylate and dicarboxylate metabolism, observed exclusively in the samples of A. fourcroydes for Basidiomycota. Still regarding metabolism, Glycosylphosphatidylinositol (GPI) is present in all three datasets of Ascomycota and GTP-binding protein is exclusive of Basidiomycota in A. sisalana.
In regard to CDD, there were also domains related to chaperones and heat shock proteins. The most frequent domain present in all communities was ACD sHSPs-like (CDD:107221), a subunit of small heat shock proteins (HSP), that plays an important role in stress protection, and is found in prokaryotes and eukaryotes alike (Ganea, 2001  were enriched in just one community, such as ''DnaJ' ' (CDD:199909) in hybrid 11648 in Ascomycota, which is also an HSP domain. For Basidiomycota, ''HSP90'' (CDD:333906) was exclusive in A. fourcroydes and ''molecular chaperone DnaK'' (CDD:234715) in hybrid 11648. Regarding primary metabolism, in the Ascomycota communities of A. sisalana and hybrid 11648 we observed ''mannitol dehydrogenase (MDH)-like'' (CDD:187610), which is responsible for catalyzing the conversion of fructose to mannitol. We also found the domain ''Fungal hexose transporter'', which is specific to the hybrid 11648 Basidiomycota community.

Identification of transport proteins and carbohydrate-degrading enzymes
To deeper investigate transporters, we blasted our fungal transcripts against the curated TCDB. We found 145, 201, and 157 proteins related to transport with alignment coverage >70% in A. fourcroydes, A. sisalana, and hybrid 11648, respectively (Table S4). The profile of most frequent families is similar between the three communities (Table 3), although ''Major Facilitator Superfamily (MFS)'' only appears in Ascomycota of A. sisalana, and Family''). In tumor cells, members of the Hsp70 and Hsp90 families can be found in association with membranes, along with co-chaperones, regulating functions related to folding and trafficking (Gross et al., 2003;Heider et al., 2021). Another family with many highly expressed transcripts in all communities is ''Endoplasmic Reticular Retrotranslocon (ER-RT)'', which is related to transport to and from the endoplasmic reticulum, mostly for degradation of misfolded proteins (Römisch, 2005). Overall, both Ascomycota and Basidiomycota presented highly expressed transporters ( Carbohydrate Active enZYmes (CAZymes) were identified among all fungal transcripts and compared between Ascomycota and Basidiomycota (Fig. 3A). The majority were classified as glycoside hydrolases (GH) and the profile between Ascomycota and Basidiomycota in each plant is different, although the pattern is similar when comparing the same phyla. Considering percentages, Ascomycota presented more GHs than Basidiomycota, but the latter had more enzymes with auxiliary activities (AA), and polysaccharide lyases (PL) were exclusive to them. The majority of these CAZymes can be related to the fungi's own carbohydrate metabolism.
To check for CAZymes that might be related to fungal-plant interactions, we compared secreted CAZymes of the three fungal communities to the ones secreted by the plant roots, considering root-specific transcripts (Fig. 3B). For Ascomycota, only A. sisalana presented types of CAZymes other than GHs. The general pattern between the three plants does not have a lot of variation except for A. sisalana, which presents more GHs. Focusing only on the fungal secreted CAZymes (Table 4), Basidiomycota secreted more CAZymes than Ascomycota, and GH128 is exclusive of Basidiomycota. Interestingly, most secreted CAZymes can be related to either plant cell wall degradation (GH10, GH11, GH16, GH17, GH43, AA9, and CE1) or fungal cell wall degradation (GH18 and GH128). Furthermore, hybrid 11648 presented GH10 and GH11 exclusively, which are related to the degradation of hemicellulose.

Heat shock protein orthologous analysis
Chaperones and heat shock proteins (HSPs) were one of the categories and pathways enriched and most frequent in the functional analysis, so we decided to carry out an analysis comparing our fungal communities with other fungal genomes. In our data, HSPs represented 6.21, 4.92, and 5.71% (5.61% average) of the total fungal transcripts To compare these numbers to other fungi, we did a protein orthologous analysis with HSPs prospected in 1,014 fungal genomes (Ensembl Fungi) and selected the top 10 genomes for Ascomycota and the top 10 for Basidiomycota with more HSPs. The list of these fungi can be found in Fig. 4. On the orthologous analysis, there were 165 HSP orthologous families (Table S5). Among these, only 22 contained at least one protein from  our agave datasets (Fig. 4), of which 4 are exclusive (no orthologous in the other fungi's genomes), presenting a minimum coverage of 70% compared to the expected length of the HSP type. These exclusive families were all annotated as ACD (alpha-crystallin domain), a domain present in small HSPs. Also, the family with more proteins from our datasets (OG0000001), which are more abundant in Ascomycota (a mean of 50 transcripts in the three sets), was annotated as ACD. The family with more proteins (419), considering all datasets, was annotated as DnaJ (OG000000), a type of co-chaperone that acts helping the folding performed by Hsp70 (Genest, Wickner & Doyle, 2019), although it is not abundant in our agave datasets, especially in Basidiomycota.
The number of transcripts in each family varied little between the six analyzed communities, showing the similarity between the expressed major HSP groups. However, regarding expression values, these numbers varied a lot inside the families, with only a few transcripts with expression above 10 TPM and the vast majority (83.9%) with values between 0.77 and 9.87 TPM. The families with more highly expressed transcripts are OG0000001, OG0000002, OG0000008, OG0000030, and OG0000114 (Table S5).
We also analyzed all HSPs from the fungal genomes, as we did not find any studies performing such comparison. Considering all fungi species analyzed, the types of HSP with more protein orthologous families were Hsp100, Hsp70, and DnaJ with 78, 75, and 60 families, respectively. The type of HSP with fewer families was Hsp90 (8 families). Curiously, some large Hsp100 families are exclusive (or almost) of some genomes and do not contain any representant in our fungal transcripts. For instance, Amanita muscaria has a family with 28 proteins (and 1 from Piloderma croceum, OG0000025), Galerina marginata has one with 12 (and 1 from A. muscaria, OG0000064), another with 15 proteins (OG0000056), and Exidia glandulosa has one with eight proteins (OG0000072). Another interesting family is OG0000005, annotated as Hsp70; it has many proteins in 8 Basidiomycota genomes (32 in Serendipita vermifera,24 in A. muscaria,17 in G. marginate, and other Basidiomycota with less than 10) and only one protein in an Ascomycota genome (Fusarium oxysporum).

Transcript assignment to the Talaromyces genus
We identified which transcripts belong to the genus Talaromyces, as the taxonomic analysis with Kaiju pointed it as the most represented genus in Ascomycota. The percentage of Talaromyces transcripts were 9.7, 15.3, and 10.8% (total of 291, 662, and 371) for A. fourcroydes, A. sisalana, and hybrid 11648, respectively (Table S6). These results are in accordance with Kaiju. Higher expression transcripts were annotated as HSPs, or have functions related to protein repair, translation, or energetic metabolism. Curiously, in the three communities there were non-root-specific transcripts. In A. fourcroydes, there is an Hsp90 expressed in all tissues, whereas in A. sisalana and hybrid 11648 there are Hsp70 and a ''translation elongation factor 1'' expressed in all tissues. This suggests that Talaromyces might be present in all agave tissues in this region.

DISCUSSION
In this study, we used bioinformatics tools to predict potential molecular functions of transcripts of fungal communities found on the transcriptomes of three agave cultivars. The percentage of fungal transcripts is consistent with a previous report that 21% of transcripts from stem samples in Eucalyptus grandis did not map to the genome of the host plant (Messal et al., 2019). Still, it is remarkable that there are more fungal root-specific transcripts than plant ones in the analyzed agave cultivars (Fig. 1A). These fungi could be epiphytic or endophytic. In the latter case, they could also be localized in the velamen region of roots. In A. sisalana, the velamen is composed of four layers of cells that have an irregular shape and absorbent hairs are found in groups on the roots, where associations with fungal hyphae may occur (Neto & Martins, 2012).
The three fungal communities we analyzed are taxonomically similar to each other, probably because all the plants were grown on the same field and climate conditions. This agrees with previous findings that, especially in agaves, geography matters more in fungal than in prokaryotic communities (Coleman-Derr et al., 2016). We noticed a large number of orphan transcripts in the orthologous analysis, which can indicate that these fungi are using different metabolisms in each plant.
Classified fungi mainly belong to the phyla Ascomycota and Basidiomycota and are present in similar proportions and numbers across the plants (Fig. 1B; Table S1). Most transcripts were classified as Ascomycota, as compared to Basidiomycota. Similarly, Citlali et al. (2018) reported, in four Agave species, a much higher proportion of Ascomycota fungi than from other phyla and a low abundance of arbuscular mycorrhizal fungi (AMF). In our data, only around 1.7% of conserved genes were assigned to AMF (Table S7), although AMF were already described in Agave species (Pimienta-Barrios, Zanudo-Hernandez & Lopez-Alcocer, 2009;Quinones-Aguilar et al., 2016;Montoya Martinez et al., 2019). To complement the results, we performed an rRNA identification analysis using the raw reads. The most abundant fungal genera in the roots were Parastagonospora, Talaromyces, and Cryptococcus. Indeed, our results pointed Talaromyces as the most abundant genus from Ascomycota (11.9% of Ascomycota transcripts) ( Table S6). As our resolution would be too low to classify the transcripts into lower taxonomic ranks, we performed all other analyses at the phylum level.
As our main interest was finding whether the fungi could be interacting with the plants, we focused on finding molecular functions related to such interactions. To do so, we investigated enriched KEGG pathways and CDD domains (Fig. 2). One of the most frequent enriched KEGG pathways for all Ascomycota and A. sisalana Basidiomycota communities was the exosome, which plays roles in cell communication and nutrient delivery. Similarly, the membrane trafficking pathway was also enriched in the same groups. Associated with signaling pathways, transport of small molecules, and metabolic processes (Geisler, Murphy & Sze, 2013), it could be related to plant-microbiome interactions. All these enriched pathways related to transport are more frequent in Ascomycota than Basidiomycota and looking at the numbers of transcripts annotated as transporters (Table S4), Ascomycota indeed presents more transporters.
Analysis of secreted CAZymes suggests interactions between the fungi and the host plants, as endophytes must break the plant cell wall (PCW) to colonize the host. PCWs are mainly formed by cellulose, hemicellulose, lignin, and pectin, which form the first barrier against pathogens and other abiotic stresses (Benoit et al., 2015). Both Ascomycota and Basidiomycota in all three communities presented CAZymes related to degradation of PCWs, such as GH16, GH17, GH43, and AA9 (Table 4) suggesting that they could be colonizing the plant roots. Most of them were annotated as glycoside hydrolases, which is the family that has cellulolytic and hemicellulolytic enzymes. On the other hand, they also presented chitinases (GH18), related to the degradation of fungal cell walls. Secreted chitinases can be used as a defense mechanism against other fungi or plant pathogens, in the case of a symbiont relationship (Aranda-Martinez et al., 2016;Yang et al., 2019). More experiments are needed to address these aspects in agave roots. Although there is a lot of variety in the distribution and quantity of CAZymes in fungal genomes, symbiotic fungi on average have fewer CAZymes (Zhao et al., 2013).
Because agaves are well adapted to dry environments and the cultivars sampled in this study were going through a period of irregular precipitations, it was expected to find expressed transcripts related to abiotic resistance among the fungi. We found enriched protein domains and KEGG pathways related to chaperones and heat shock proteins, which are among the top expressed transcripts and the most frequent transporter families. HSPs have many different functions, many related to protein folding in some way. Particularly in fungi, they act in stress resistance, sporulation, sexual/asexual development, and virulence (Bui et al., 2016;Chatterjee & Tatu, 2017). Additionally, HSPs were also important drug targets in fungal-caused diseases (Lamoth, Juvvadi & Steinbach, 2016). One of their main functions, however, as the name suggests, is related to temperature stress. Our samples were collected at noon in a very dry region, and in similar environments the soil surface can reach over 40 • C (Nobel, 2010;Sattari, Dodangeh & Abraham, 2017). Therefore, it seems reasonable to find many HSPs as top expressed transcripts and within enriched categories, although more experiments are needed to test if they contribute to temperature stress responses.
To further investigate the different HSPs found in our datasets and to compare them to other fungi, we did an HSP orthologous analysis. The main type of HSP found in our agave fungal transcript datasets was small HSPs (Fig. 4), annotated as ACD (alpha-crystallin domain). ACD is a conserved domain through evolution, although the whole sequence of small HSPs varies (Kriehuber et al., 2010). Small HSPs can act in response to temperature stress; some of them were described as conferring tolerance to freezing (Pacheco et al., 2009) and heat shock (Haslbeck et al., 1999) in Saccharomyces cerevisiae. Also, small HSPs were described as the first defense against many stresses (Haslbeck & Vierling, 2015). In filamentous fungi, the number of copies of small HSPs in their genomes does not vary much (3-5 copies each) and they diverge a lot across fungal species (Wu et al., 2016). This could explain the difference between the number of proteins clustered in the other fungal genomes analyzed and our fungal transcripts (OG0000001), as maybe our samples presented closely related species grouped. However, there are more ACD-annotated proteins in Ascomycota, whereas the Basidiomycota fungal genomes present more copies of ACDs. This suggests that even if the Basidiomycota fungi present in the agave community also have many copies in their genomes, they are not expressing them.
To our knowledge, there are no works comparing HSPs across these fungal genomes. In this regard, one important protein family for the fungal genomes, both in Ascomycota and Basidiomycota, but not so present in our fungal transcripts, was DnaJ (Table S5). There are 60 families of DnaJ, being the third type of HSP more abundant in the whole dataset. DnaJ proteins confer stress protection, playing key roles in the cell death cycle and resistance to diseases (Liu & Whitham, 2013). The families with more proteins assigned to were Hsp100 (78 families) and Hsp70 (75 families). Hsp70 and Hsp100 both form a complex that acts in the disaggregation of other proteins, which could explain the similar number of transcripts belonging to these families in our agave datasets (OG0000002 and OG0000003). All HSPs were more uniformly distributed through Ascomycota genomes, but Hsp100 presented many exclusive or almost exclusive families in many Basidiomycota genomes. Thus, Hsp100 are seemingly important chaperones for some Basidiomycota.

CONCLUSIONS
In this study, we identified and characterized fungal transcripts found in the transcriptomes of three agave cultivars grown in the Brazilian semiarid without irrigation and under an irregular precipitation regime. We found more fungal root-specific transcripts than plant ones. These fungi mainly belong to two different phyla which are performing somewhat distinct functions, many related to interactions with the host plant and others related to drought resistance. Microbial communities can contribute to increasing the host plant's resistance to many biotic and abiotic factors. Therefore, the current study underlines the importance of analyzing possible ''contaminants'' that may appear in transcriptome datasets, as valuable information might be present. In summary, our exploratory analysis of fungal communities provides a starting point to the prospection of potential microorganisms that could be exploited to generate improved agronomical characteristics in agave or other cultures.
• Maria Fernanda Zaneli Campanari and Marcelo Falsarella Carazzolle conceived and designed the experiments, performed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.
• Fabio Trigo Raya and Gonçalo Amarante Guimarães Pereira conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability
The following information was supplied regarding data availability: The sequences are available at NCBI SRA: PRJNA746623. The assembled transcripts and protein sequences are available in the Supplemental File.